Addressing the Curse of Imbalanced Training Sets: One-Sided Selection
نویسندگان
چکیده
Adding examples of the majority class to the training set can have a detrimental eeect on the learner's behavior: noisy or otherwise unreliable examples from the majority class can overwhelm the minority class. The paper discusses criteria to evaluate the utility of clas-siiers induced from such imbalanced training sets, gives explanation of the poor behavior of some learners under these circumstances, and suggests as a solution a simple technique called one-sided selection of examples.
منابع مشابه
Learning Greek Verb Complements: Addressing the Class Imbalance
Imbalanced training sets, where one class is heavily underrepresented compared to the others, have a bad effect on the classification of rare class instances. We apply One-sided Sampling for the first time to a lexical acquisition task (learning verb complements from Modern Greek corpora) to remove redundant and misleading training examples of verb nondependents and thereby balance our training...
متن کاملResampling Imbalanced Class and the Effectiveness of Feature Selection Methods for Heart Failure Dataset
Clinical datasets commonly have an imbalanced class distribution and high dimensional variables. Imbalanced class means that one class is represented by a large number (majority) of samples more than another (minority) one in binary classification [1]. For example, in our research dataset there are 1459 instances classified as “Alive” while 485 are classified as “Dead”. Machine learning is gene...
متن کاملResampling Imbalanced Class and the Effectiveness of Feature Selection Methods for Heart Failure Dataset
Clinical datasets commonly have an imbalanced class distribution and high dimensional variables. Imbalanced class means that one class is represented by a large number (majority) of samples more than another (minority) one in binary classification [1]. For example, in our research dataset there are 1459 instances classified as “Alive” while 485 are classified as “Dead”. Machine learning is gene...
متن کاملA Novel One Sided Feature Selection Method for Imbalanced Text Classification
The imbalance data can be seen in various areas such as text classification, credit card fraud detection, risk management, web page classification, image classification, medical diagnosis/monitoring, and biological data analysis. The classification algorithms have more tendencies to the large class and might even deal with the minority class data as the outlier data. The text data is one of t...
متن کاملMachine Learning Methods for High-Dimensional Imbalanced Biomedical Data
Learning from high dimensional biomedical data attracts lots of attention recently. High dimensional biomedical data often suffer from the curse of dimensionality and have imbalanced class distributions. Both of these features of biomedical data, high dimensionality and imbalanced class distributions, are challenging for traditional machine learning methods and may affect the model performance....
متن کامل